智能论文笔记

Shakebot: A Low-cost, Open-source Shake Table for Ground Motion Seismic Studies

Zhiang Chen , Devin Keating , Yash Shethwala , Aravind Adhith Pandian Saravanakumaran , Ramon Arrowsmith , Chris Madugo , Albert Kottke , Jnaneshwar Das

分类：机器人

2022-12-21

Our earlier research built a virtual shake robot in simulation to study the dynamics of precariously balanced rocks (PBR), which are negative indicators of earthquakes in nature. The simulation studies need validation through physical experiments. For this purpose, we developed Shakebot, a low-cost (under $2,000), open-source shake table to validate simulations of PBR dynamics and facilitate other ground motion experiments. The Shakebot is a custom one-dimensional prismatic robotic system with perception and motion software developed using the Robot Operating System (ROS). We adapted affordable and high-accuracy components from 3D printers, particularly a closed-loop stepper motor for actuation and a toothed belt for transmission. The stepper motor enables the bed to reach a maximum horizontal acceleration of 11.8 m/s^2 (1.2 g), and velocity of 0.5 m/s, when loaded with a 2 kg scale-model PBR. The perception system of the Shakebot consists of an accelerometer and a high frame-rate camera. By fusing camera-based displacements with acceleration measurements, the Shakebot is able to carry out accurate bed velocity estimation. The ROS-based perception and motion software simplifies the transition of code from our previous virtual shake robot to the physical Shakebot. The reuse of the control programs ensures that the implemented ground motions are consistent for both the simulation and physical experiments, which is critical to validate our simulation experiments.

translated by 谷歌翻译

Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations

Lianke Qin , Aravind Reddy , Zhao Song , Zhaozhuo Xu , Danyang Zhuo

分类：机器学习

2022-12-21

In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation. Given a data-set $X \subset \mathbb{R}^d$, a binary function $f:\mathbb{R}^d\times \mathbb{R}^d\to \mathbb{R}$, and a point $y \in \mathbb{R}^d$, the Pairwise Summation Estimate $\mathrm{PSE}_X(y) := \frac{1}{|X|} \sum_{x \in X} f(x,y)$. For any given data-set $X$, we need to design a data-structure such that given any query point $y \in \mathbb{R}^d$, the data-structure approximately estimates $\mathrm{PSE}_X(y)$ in time that is sub-linear in $|X|$. Prior works on this problem have focused exclusively on the case where the data-set is static, and the queries are independent. In this paper, we design a hashing-based PSE data-structure which works for the more practical \textit{dynamic} setting in which insertions, deletions, and replacements of points are allowed. Moreover, our proposed Adam-Hash is also robust to adaptive PSE queries, where an adversary can choose query $q_j \in \mathbb{R}^d$ depending on the output from previous queries $q_1, q_2, \dots, q_{j-1}$.

translated by 谷歌翻译

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Esaú Villatoro-Tello , Srikanth Madikeri , Juan Zuluaga-Gomez , Bidisha Sharma , Seyyed Saeed Sarfjoo , Iuliia Nigmatulina , Petr Motlicek , Alexei V. Ivanov , Aravind Ganapathiraju

分类：自然语言处理 | 人工智能

2022-12-16

In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.

translated by 谷歌翻译

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

Nicklas Hansen , Zhecheng Yuan , Yanjie Ze , Tongzhou Mu , Aravind Rajeswaran , Hao Su , Huazhe Xu , Xiaolong Wang

分类：机器学习 | 计算机视觉 | 机器人

2022-12-12

We revisit a simple Learning-from-Scratch baseline for visuo-motor control that uses data augmentation and a shallow ConvNet. We find that this baseline has competitive performance with recent methods that leverage frozen visual representations trained on large-scale vision datasets.

translated by 谷歌翻译

CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning

Zhao Mandi , Homanga Bharadhwaj , Vincent Moens , Shuran Song , Aravind Rajeswaran , Vikash Kumar

分类：机器人 | 人工智能 | 机器学习

2022-12-12

Developing robots that are capable of many skills and generalization to unseen scenarios requires progress on two fronts: efficient collection of large and diverse datasets, and training of high-capacity policies on the collected data. While large datasets have propelled progress in other fields like computer vision and natural language processing, collecting data of comparable scale is particularly challenging for physical systems like robotics. In this work, we propose a framework to bridge this gap and better scale up robot learning, under the lens of multi-task, multi-scene robot manipulation in kitchen environments. Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training. In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage, and the significant improvement of training efficiency by using pretrained out-of-domain visual representations at the compression stage. Experimentally, we demonstrate that 1) on a real robot setup, CACTI enables efficient training of a single policy capable of 10 manipulation tasks involving kitchen objects, and robust to varying layouts of distractor objects; 2) in a simulated kitchen environment, CACTI trains a single policy on 18 semantic tasks across up to 50 layout variations per task. The simulation task benchmark and augmented datasets in both real and simulated environments will be released to facilitate future research.

translated by 谷歌翻译

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations

Nicklas Hansen , Yixin Lin , Hao Su , Xiaolong Wang , Vikash Kumar , Aravind Rajeswaran

分类：机器学习 | 人工智能 | 机器人

2022-12-12

Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications, and in particular for visuo-motor control. Model-based RL has the potential to be highly sample efficient by concurrently learning a world model and using synthetic rollouts for planning and policy improvement. However, in practice, sample-efficient learning with model-based RL is bottlenecked by the exploration challenge. In this work, we find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL. Simply appending demonstrations to the interaction dataset, however, does not suffice. We identify key ingredients for leveraging demonstrations in model learning -- policy pretraining, targeted exploration, and oversampling of demonstration data -- which forms the three phases of our model-based RL framework. We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks compared to prior approaches in the low data regime (100K interaction steps, 5 demonstrations). Code and videos are available at: https://nicklashansen.github.io/modemrl

translated by 谷歌翻译

Multi-Label Chest X-Ray Classification via Deep Learning

Aravind Sasidharan Pillai

分类：计算机视觉

2022-11-27

In this era of pandemic, the future of healthcare industry has never been more exciting. Artificial intelligence and machine learning (AI & ML) present opportunities to develop solutions that cater for very specific needs within the industry. Deep learning in healthcare had become incredibly powerful for supporting clinics and in transforming patient care in general. Deep learning is increasingly being applied for the detection of clinically important features in the images beyond what can be perceived by the naked human eye. Chest X-ray images are one of the most common clinical method for diagnosing a number of diseases such as pneumonia, lung cancer and many other abnormalities like lesions and fractures. Proper diagnosis of a disease from X-ray images is often challenging task for even expert radiologists and there is a growing need for computerized support systems due to the large amount of information encoded in X-Ray images. The goal of this paper is to develop a lightweight solution to detect 14 different chest conditions from an X ray image. Given an X-ray image as input, our classifier outputs a label vector indicating which of 14 disease classes does the image fall into. Along with the image features, we are also going to use non-image features available in the data such as X-ray view type, age, gender etc. The original study conducted Stanford ML Group is our base line. Original study focuses on predicting 5 diseases. Our aim is to improve upon previous work, expand prediction to 14 diseases and provide insight for future chest radiography research.

translated by 谷歌翻译

Output Mode Switching for Parallel Five-bar Manipulators Using a Graph-based Path Planner

Parker B. Edwards , Aravind Baskar , Caroline Hills , Mark Plecnik , Jonathan D. Hauenstein

分类：机器人

2022-09-22

平行操纵器的配置歧管比串行操纵器表现出更多的非线性。从定性上讲，它们可以看到额外的褶皱。通过将这种歧管投射到工程相关性的空间上，例如输出工作区或输入执行器空间，这些折叠式的边缘呈现出表现非滑动行为的边缘。例如，在五杆链接的全局工作空间边界内显示了几个局部工作空间边界，这些边界仅限于该机制的某些输出模式。当专门研究这些投影而不是配置歧管本身时，这种边界的存在在输入和输出投影中都表现出来。特别是，非对称平行操纵器的设计已被其输入和输出空间中的外来投影所困扰。在本文中，我们用半径图表示配置空间，然后通过使用同型延续来量化传输质量来解决每个边缘。然后，我们采用图路径计划器来近似于避免传输质量区域的配置点之间的大地测量。我们的方法会自动生成能够在非邻居输出模式之间过渡的路径，该运动涉及示波多个工作空间边界（局部，全局或两者）。我们将技术应用于两个非对称五杆示例，这些示例表明如何通过切换输出模式来选择工作空间的传输属性和其他特征。

translated by 谷歌翻译

Cross apprenticeship learning framework: Properties and solution approaches

Ashwin Aravind , Debasish Chatterjee , Ashish Cherukuri

分类：机器学习

2022-09-06

学徒学习是一个框架，代理商使用专家提供的示例轨迹来学习在环境中执行给定任务的策略。在现实世界中，在学习任务相同的情况下，在系统动力学不同的不同环境中，人们可能可以访问专家轨迹。对于这种情况，可以定义两种类型的学习目标。一个在一个特定的环境中，当学习策略在所有环境中都表现良好时，该政策在一个特定的环境中表现良好。为了以原则性的方式平衡这两个目标，我们的工作介绍了交叉学徒学习（CAL）框架。这包括一个优化问题，要求寻求每个环境的最佳策略，同时确保所有政策保持彼此之间。优化问题中的一个调谐参数可以促进此临近。随着调整参数的变化，我们得出问题优化者的属性。由于该问题是非convex，因此我们提供凸外近似。最后，我们在大风的环境环境中的导航任务中演示了我们框架的属性。

translated by 谷歌翻译

In Silico Prediction of Blood-Brain Barrier Permeability of Chemical Compounds through Molecular Feature Modeling

Tanish Jain , Praveen Kumar Pandian Shanmuganathan

分类：机器学习

2022-08-18

用于分析化学数据的计算技术的引入引起了对生物系统的分析研究，称为“生物信息学”。生物信息学的一个方面是使用机器学习（ML）技术在各种情况下检测多变量趋势。最紧迫的情况之一是预测血脑屏障（BBB）的渗透性。治疗中枢神经系统疾病的新药物的开发由于在血脑屏障中的渗透功效不佳而带来了独特的挑战。在这项研究中，我们旨在通过分析化学特征的ML模型来减轻此问题。这样做：（i）给出了相关的生物系统和过程以及用例的概述。（ii）第二，对检测BBB渗透性的现有计算技术进行了深入的文献综述。从那里开始，确定了跨电流技术的一个方面，并提出了解决方案。（iii）最后，开发，测试和反映了通过被动扩散在整个BBB上具有确定特征的药物渗透性的两部分，以量化具有定义特征的药物的渗透性。使用数据集进行的测试和验证确定预测LOGBB模型的平方误差约为0.112单位，而神经炎症模型的均方误差约为0.3个单位，胜过所有相关研究。

translated by 谷歌翻译